Given a natural language that describes the user's demands, the NL2Code task aims to generate code that addresses the demands. This is a critical but challenging task that mirrors the capabilities of AI-powered programming. The NL2Code task is inherently versatile, diverse and complex. For example, a demand can be described in different languages, in different formats, and at different levels of granularity. This inspired us to do this survey for NL2Code. In this survey, we focus on how does neural network (NN) solves NL2Code. We first propose a comprehensive framework, which is able to cover all studies in this field. Then, we in-depth parse the existing studies into this framework. We create an online website to record the parsing results, which tracks existing and recent NL2Code progress. In addition, we summarize the current challenges of NL2Code as well as its future directions. We hope that this survey can foster the evolution of this field.
translated by 谷歌翻译
随着智能设备和物联网无处不在的部署的出现,机器学习推断的数据源已越来越多地转移到网络的边缘。现有的机器学习推理平台通常假设一个均匀的基础架构,并且不考虑包括边缘设备,本地集线器,边缘数据中心和云数据中心的更复杂和分层的计算基础架构。另一方面,最近的Automl工作为异质环境提供了可行的解决方案,用于模型压缩,修剪和量化。对于机器学习模型,现在我们可能很容易找到甚至生成一系列在准确性和效率之间进行不同权衡的模型。我们设计和实施Jellybean,这是一种用于服务和优化机器学习推理工作流程的系统。给定的服务级目标(例如,吞吐量,准确性),Jellybean选择了满足准确性目标的最具成本效益的模型,并决定如何在基础架构的不同层次上部署它们。评估表明,与最先进的模型选择和工人分配解决方案相比,Jellybean的视觉问题回答总成本最高可达58%,而NVIDIA AI City Challenge的车辆跟踪最多可达36%。 Jellybean还优于先前的ML服务系统(例如,在云上火花)的服务成本高达5倍。
translated by 谷歌翻译
代码生成是一个长期的挑战,旨在根据自然语言描述生成代码段。通常,昂贵的文本编码配对数据对于培训代码生成模型至关重要。最近,由于培训预培训技术的成功,大型语言模型接受了大规模未标记的代码语料库的培训,并在代码生成方面表现良好。在本文中,我们调查了如何利用未标记的代码语料库来训练以图书馆为导向的代码生成的模型。由于对于程序员重复使用第三方库是一种普遍的做法,因此由于库数量大量,文本编码配对数据很难获得。我们观察到面向图书馆的代码片段更有可能共享类似的代码草图。因此,我们为证书提供了两个步骤:草图器生成草图,然后发电机填充了草图中的详细信息。 Sketcher和Generator都使用未标记的数据在基本模型上不断预先训练。此外,我们制作了两个名为Pandaseval和NumpyeVal的基准,以评估面向图书馆的代码生成。实验结果证明了CERT的表现令人印象深刻。例如,它超过了基本模型,在pandaseval上的Pass@1方面,绝对提高了15.67%。我们的工作可在https://github.com/microsoft/pycodegpt上获得。
translated by 谷歌翻译
基于骨架的动作识别方法受到时空骨骼图的语义提取的限制。但是,当前方法在有效地结合时间和空间图尺寸的特征方面很难,一侧往往厚度厚,另一侧较薄。在本文中,我们提出了一个时间通道聚合图卷积网络(TCA-GCN),以动态有效地学习基于骨架动作识别的不同时间和通道维度中的空间和时间拓扑。我们使用时间聚合模块来学习时间维特征和通道聚合模块,以有效地将空间动态通道拓扑特征与时间动态拓扑特征相结合。此外,我们在时间建模上提取多尺度的骨骼特征,并将其与注意机制融合。广泛的实验表明,在NTU RGB+D,NTU RGB+D 120和NW-UCLA数据集上,我们的模型结果优于最先进的方法。
translated by 谷歌翻译
联合学习(FL)以来已提议已应用于许多领域,例如信用评估,医疗等。由于网络或计算资源的差异,客户端可能不会同时更新其渐变可能需要花费等待或闲置的时间。这就是为什么需要异步联合学习(AFL)方法。AFL中的主要瓶颈是沟通。如何在模型性能和通信成本之间找到平衡是AFL的挑战。本文提出了一种新的AFL框架VAFL。我们通过足够的实验验证了算法的性能。实验表明,VAFL可以通过48.23 \%的平均通信压缩速率降低约51.02 \%的通信时间,并允许模型更快地收敛。代码可用于\ url {https://github.com/robai-lab/vafl}
translated by 谷歌翻译
关于数据隐私和安全性的越来越多的担忧驱动了从孤立的数据源,即联合学习的隐私保留机学习的新兴领域。一类联合学习,\ Texit {垂直联合学习},不同的各方对共同用户的不同特征,具有促进许多领域企业之间各种业务合作的潜力。在机器学习中,诸如梯度提升决策树(GBDT)和随机森林等决策树集合被广泛应用强大的型号,具有高的可解释性和建模效率。然而,最先进的垂直联合学习框架适应匿名功能以避免可能的数据泄露,使模型受到损害的可解释性。为了解决推理过程中的这个问题,在本文中,我们首先在垂直联合学习中对客场党的特征披露含义的必要性进行了问题分析。然后,我们发现树的预测结果可以表示为所有各方持有的树的子模型结果的交叉点。利用这种关键观察,我们通过隐藏决策路径来保护数据隐私并允许公开特征含义,并适应推理输出的通信有效的安全计算方法。通过理论分析和广泛的数值结果,将证明FED-EINI的优点。我们通过披露特征的含义来提高模型的可解释性,同时确保效率和准确性。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译